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The general question examined by this study was 
whether the tendency of subjects to ignore the known score in giving 
the best guess for a sample mean was due to a descriptive heuristic 
such as representativeness or to a mechanistic one such as active 
balancing. Two experiments were conducted. In Experiment 1, subjects 
estimated: (1) the mean of a random sample of ten scores consisting 
of nine unknown scores and a known score that was divergent from the 
population mean; and (2) the mean of the nine unknown scores. The 
modal answer (about 40% of the responses) for both sample means was 
the population mean. The results extend the work of Tversky and 
Kahneman (1971) by demonstrating that subjects hold a passive, 
descriptive view of random sampling rather than an active balancing 
model. This result was explored further in in-depth interview's 
(Experiment 2), wherein subjects solved the problem while explaining 
their reasoning. The interview data replicated Experiment 1 and 
further showed (a) that subjects* solutions were fairly stable — when 
presented with alternative solutions including the correct one, few 
subjects changed their arswers; (b) little evidence of a balancing 
mechanism; and (c) that acceptance of both means as 400 is largely a 
result of the perceived unpredictability of **random samples.** 
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Abs t ract 

In Experiment 1, subjects estimated a) the mean of a random sample of ten 

scores consisting of nine unknown scores and a known score that was dlvcrt^ont 

from the population mean; and b) the mean of the nine unknown scores. 

The modal answer (about /iO% of tlie responses) for both sample means was 

the population moan. Ihe results extend the work of Tversky and Kahneman j 

(1971) by demonstrating that subjects hold a passive, descriptive view of 

random sampling rather than an active balancing model. Tins result was 

explored further in in-depth interviews (Experiment 2), wherein subjects 

solved the problem while explaining their reasoning. Tlie interview data 

replicated Experiment 1 and further showed (a) that subjects' solutions 

were fairly stable — when presented with alternative solutions including 

the correct one, few subjects changed their answer; (b) little evidence 

of a balancing mechanism; and (c) that acceptance of both means as AOO is 

largely a result of the perceived unpredictability of "random samples." 
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Beliefs Underlying Random Sampling 

There is at present a large body of evidence indicating that students 
believe that random samples resemble the population from which they are 
dravra. If the sample size is sufficiently large, then a random sample 
will, in fact, tend to be similar to the population from which it is drawn. 
IJliere the typical student apparently differs from the norniative model of 
statistics is that he or she believes that sma] 1 as well as large samples 
have a high probability of looking like the population. Tversky and 
Kahneman (1971) have dubbed this misconception "The Law of Snmll Numbers." 
They proposed that a heuristic or belief called "representativeness" under- 
lies this misconception. "A person who follows this heuristic evaluates 
the probability of an uncertain event, or a sample, by the degree to which 
it is; (i) similar in essential properties to its parent population; and 
(ii) reflects the salient features of the process by which it is generated^' 
(Kahneman & Tver.sky» 1972, j). A31). 

One source of evidence for this misconception has come from investigation 
of what is popularly known as the "gambler*s fallacy." A simple example 
of the gambler's fallacy is the belief that if a fair coin has come up 
heads a large number of times in a row, then there is an increased chance 
that it wiJ come up tails on the next flip. The gambler's fallacy can 
be described as the belief that in random sampling, the data that have 
already been sampled will influence the data that are yet to be sampled. 
This, of course, violates independence, which is a fundamental property 
of t-ue random sampling. In real-life coin flipping, shaking the coin 
well between flips would guarantee some reasonable approximation to 
Independence from one flip to another. 



Ths prototypical problem used by Tversky and Kahneinan (1371) to explore 

the gambler's fallacy is the following. 

The mean IQ o'^ the population of eighth graders 
in a city is known to be 100. You liave selected a 
random saraple of 50 children for a study of educational 
achievements. The first child tested has an IQ of 
150. What do you expect the mean IQ to be for the 
v/hole sample? 

If the sampling were random, then the best guess for the mean score of 
the next A9 children sampled is 100. Therefore the best guess for the 
entire sample of 50 children is the weighted mean of 150 and 100, or 101. 
Ilov/ever, the typical answer to this problem is 100. This finding reflects 
the gambler's fallacy because the answer of "100" violates the assumption 
of independence. Anfiwe.ring "100" logically implies that the mean of the 
next A9 children is influenced by the score of the first child sampled. 
It is not known whetlier subjects realize that this implication follows 
from their answer, or wliether the implication Is a critical component of 
the representativeness heuristic. Before discussing this question, we 
must briefly discuss other evidence for representativeness. 

Kahneman and Tversky (1972) and Bai-llillel (1980) have employed a 
second paradigm to demonstrate the heuristic of representativeness. 
Typically, the subject is shown two samples and asked to judge which is 
more likely. In their original work, Kahneman and Tversky (1972) dealt 
with events modelled by Bernoulli trials. They found, for example, that ' 
subjects thought that for a sequence of six births, the exact order of G 
B G B li G is more likely than the order B G B B B B, presumably because 
the sequence with five girls and one boy fails to reflect the proportion 
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of boys and girls in the population. Subjects also estimated that the 
probability of a sequence like B B B G G G was less than that of G B B G 
B G, presumably because the former appears lr*>o random. Bar-Hillel (1980) 
has extended this research to determine which characteristics of samples 
subjects are attending to when they judge a sample to be more or less 
likely than another. She found that subjects think that a saniple should 
have not only about the same mean as the population, but also about the 
same standard deviation. 

The evidence thus is compelling that subjects believe that even small 
samples should look like the population and that a random sample should 
look random. Our interest is in determining whether the heuristic of 
representativeness is a fundamental belief, or axiom, in the layman's 
theory of random samples, or whether it is deducible frorr. some more basic 
mechanistic belief. This distinction will become clearer if we digress 
for a moment and speculate about how an expert thinks about large samples. 

Presumably, an expert's fundamental conception of random variables 
and random sampling is a process model. Perhaps the mos*" widely used 
luodel is the "urn-drawing'* or "box" model, in which random sampling is 
viewed as isomorphic to the process of drawing labeled balls or slips of 
paper from an urn or box, replacing them, shaking well, and then drawing 
again. From this model, the idealization of which can be sumniarized by 
algebraic expressions, certain conclusions follow. These include the 
"Law of Large Numbers" which says (roughly) that if a random sample is 
large enough, the relative frequencies of outcomes in the sample havo a 
very high probability of being a closo approximation to those in the 
population. It is likely that in dealing with large samples, the expert 
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sijnpJy appeals to the property of representativeness derivable from the 
Law of Large Numbers, rather than conceptualizing random sampling in terms 
of a process. However, if challenged, or if some absurd consequence arose 
from «n attempted application of this intuitive version of the Law of Large 
Numbers, the expert could go back to the more basic process 'nodel of sampling 
to check whether the consequence did in fact follow from probability theory. 

The evidence shows tliat novices are likely to believe that small, as 
well as large, samples are representative. (There are data indicating that 
experts overapply representativeness as well, TVersky & Kahneman, 1971.) 
This belief could plausibly follow from one of two basic heuristics. The 
first possibility is that representativeness itself is the basic heuristic. 
In other words, the basic heuristic in thinking about random samples is 
def^cri ptive : .andom samples look approximately like the population, and 
further- , random sequences of events look "random." There is a second 
possibility, however. Sub,iects could have an erroneous process model of 
random samples from which representativeness of even small samples followed 
a<; a conclusion, just as the heuristic of representativeness for large 
sau, U'S could follow from the correct urn-drawing heuristic of the expert. 
What might such a process model be? One that has been suggested in statistics 
books (e.g., Freedinan, Picani & Purves, 1978, Ch. 16; Hays, 1981, Ch. 1) is 
"active balancing" or "compensation," specifically, that some active process 
guarantees that things will even out in the long run. Apparently, such a 
belief is exposed in the coin flipping example of the gambler's fallacy when 
the subject predicts that following a run of tails, the next coin is likely 
to co-ne up heads. Tlie idea that things will "even out" suggests a notion of 
active balancing > 
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However, the heuristic of active balancing might be deduced from the 
heuri ic of representativeness. If, in the coin example, the subject 
believes that samples should look like the population of outcomes of flips, 
then samples that are close to half heads and half tails will be the most 
repr-:sentative. If one has already observed nine heads and is predicting 
the outcome of the tenth flip, then presumably a sample of nine heads and 
one tail will be more representative of the population than a sample of 
ten heads, so that the outcome of "tail" on the tenth trial should be more 

likely than "head." 

On what basis can one decide whether the representativeness (i.e., 
descriptive) or activc-balancine heuristic is the laore basic? In the coin 
example mentioned above, both heuristics would predict that a head would 
be more likely to turn up following a nm of tails. However, situations 
exist in wh.ch the active-balancing and the representativeness heuristic 
lead to different predictions. Consider the Tversky and Kahneman (1971) 
IQ example mentioned earlier. Again, both heuristics would predict an 
answer of 100. However, if askeO to predict, the mean IQ of the last A9 
students in the sample, subjects who thought that all samples should look 
like the population would give an answer of 100, but those who employed 
an active-balancing heuristic would give an answer smaller than 100 (so 
that the entire sample of 50 scores could average 100). 

•fhe present study extends the Tversky and Kahneman (1971) study by 
employing an additional foliow-up question about the mean of the sample 
excluding the known score. Additionally, we were concerned that subjects 
might think of 101 as beinfj appr.-xJmately ICC, and thup answer '100" even 
though they knew the mean would bo slightly higher than 100. Accordingly, 
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in our problems the sample size was laace smaller so that the difference 
between the correct answer and the population mean would be more salient. 
Another feature of our experiments was to have some subjects "think out 
loud" so that W3 could better understand the heuristics they were employing. 

Experiment I 
Method 



Materials Two problems were employed. One was a variant of the 
Tveri;ky and Kahnfiiian IQ problem stated above. 
IQ Problem 

The average IQ of th^ population of eighth graders in a city is 
knov/n to be 100. You have selected a random sample of 10 
children for a study in educational achievement. The first 
child tested has an IQ of 150. IHiat do you expect the average 
IQ to be for the whole sample? 

Ifliat do y^ - expect the average IQ to be for the next 9 
children, .lOt including the 150? 

(Tlie correct solution to the first question is 105, to 
the second, 100.) 

The second problem which was employed is similar, using an SAT instead 
of IQ cover story. 



SAT Problem 

The average SAT for all the high school students in a large 
school district is known to be AOO. You have randomly picked 
10 students for a study in educational achievement. The first 
student you picked had an SAT of 250. Wliat do you expect 
the average SAT to be for the entire sample of 10? 

\71iat do you expect the average SAT to be for the next 9 
students, not including the 250? 

(Tlic co»-rect solution to the first question is* 385, to 
the, .second, AOO. ) 



Subjects , The subjects were undergraduates at the University of 



Massachusetts who were enrolled in psychology courses. The 31 subjects 
who were interviewed were selected from a pool of student volunteers and 
received bonus class credit for their participation. The 205 students who 
filled out questionnaires did so during a regular class session and were told 
that they would be helping us to understand how people think about scati^tics. 
No subject participated in both the questionnaire and interview phase. 
Both phases contained approximately equal numbers of males and females. 

Procedure ^ The questionnaire was administered to four undergraduate 
psychology statistics classes and took about ten minutes to complete. The 
r.AT problem was the first of three problems on the questionnaire and both 
parts of the SAT problem appeared together on a single page. 

In the interview phase, the suhjecc was given either the SAT or the 
IQ problem, as well as several other unrelated problems that will not be 
discussed in this paper. A jjubject was i',iven a sheet of paper contai. xng 
the first paragraph of the problem and asked to read it out loud, so that 
the experimenter knew that it had been read correctly. The subject then 
answered the first question, thinking aloud as much as possible. When 
he or she had given an answer, the interviewer orally presented the 
second part of the problem, Hie interviewer then asked follow-up questions 
designed to further elucidate what subjects were thinking. The session 
lasted about one hour, and approximately 10 to 15 minutes were rpent on * 
one of the two problems discussed here. 
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Results and DiscJsslon 

The data are displayed in Table 1, For the questionnaire subjects^ 
the numerical a;^swers were tabulated. For the interviewed subjects, the 
*.jmerical answers, before any interviewer intervention, were obtained 
from videotapes. Several feitures are apparent. First, the answer pre- 
dicted by representativeness, namely that the means of both samples are 
equal to tlie population mean, is thu modal answer. It was given by 33% 
of the iiubjects answering the questionnaires and A8% of the subjects m 
thu interviews. Second, there is coi'siderabl^? variation in tho answers 
given by subjects. Twenty-one percent of the subjects gave the correct 
solution and only 13% of the subjects gave an answer consistent with a 
balancing heuristic. 



Insert Table 1 about here 



In addition, 33% of the questionnaire subjects and 13% of the interview 
subjects gave answers inconsistent with the correct solution, representa- 
tiveness, or ba ancing. The fact that most of these "deviant" answers 
occurred in the questionnaire situation suggests that many of them are a 
result of not reading the question carefully enough, thus misunderstanding 
it on a trivial level. Many of these subjects reported a best guess of 
greater than AOO for the sample of 10, which seems uninterpretable except 
as a misreading of the quention. However, one pattern (labeled "Trend" on 
Table 1) deserves some comment, because it appeared in the interviews and 
has a plausible underlying rationale. In this pattern, subjects the ght 
(correctly) that the moan of the sample of 10 would be lower than 400, 
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In addition, the two means they gave were consistent, in that the mean of 
ten could be the average ot the first observation and the average of the 
next nine observations. However, it departed from the correct statistical 
answer in that the mean of the next nine students was also thought to be 
less than /<00. Comments from the two subjects in the interviews who showed 
this pattern of responses indicated that the divergent first score led 
tJicm to believe that the population mean was not actually 400 as stated 
in the problem. 

In summary, the present results replicate those of Kahneman and Tversky 
(1972) in that the modal estimate of the mean of the sample of 10 was the 
population mean. More importantly, 71% of the 95 questionnaire subjects 
and 71% of the 21 interview subjects who gave the population mean for the 
mean of the sample of 10, also gave the population mean as their best guess 
of the 'Tiean of the 9 unknoun scores. The percentage for each group was 
significantly greater than 50%, %^ (I) = 26.5, £ < .001, and (1) - 
3.86, £ < .05, respect iveJ> . Tliis answer was inconsistent with a balancing 
heuristic and indicated that these subjects thought that both the sample 
of 10 students and the sample of 9 students were representative. Moreover, 
representativeness could even be the fundamental heuristic for subjects 
classified as "balancers." Using the argurnent in the introduction, one 
could claim that these subjects took the sample of ten as fundamental, 
believing that it should be representative, and then demanded enough 
consistency of their predictions to make the mean of the sample of nine 
consistent with the'r answer for the mean of the sample of ten. On the 
other hand, it is possible that subjects who give answers consistent with 
a balancing solution think fundamentally differently about the problem 
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than the subjects who ^ive a reprosentativeness answer. 

We had hoped that in-depth analyses of the interview videotapes would 
provide further insights into subjects^ heuristics. Unfortunately, au !io 
problems with the recording equipment made evaluating some protocols 
extiemely difficult. Accordingly, a second set of interviews was conducted 
with «^^w equipment. In these interviews, a relatively standardized set 
of probe questions was developed, based on an analysis of the rr^st infor- 
mative probes used in the first set of interviews. The focus of the more 
standardized interviews was to confront subjects with solutl^^ns different 
irum thei^ own. V/e believed that inf onaition could be obtained from this 
confrontation that would be difficult to obtain from a more objective format. 
First, the strenj^th of subjects' confidence in their ars\ -rs could be ajsebsed. 
If they maintained their solution after being shown reasonable alternatives, 
then one could conclude that their original answer was not frivolous. 
Second, since subjects were^ given only the alternative numerical solutions 
and were asked what they thought the rationale was for those solutions, 
their understanding of the problem couid be assessed more fully, 

Hxper iment 2 
Method 

Subjec ts. ITic subjects were 26 students recruited from undergraduate 
psychology classes who participated in the experiments for extra credit. 
The interview of one subject, whose data are not reported ^ was stopped in 
the middle, since ihe appeared to very anxious ir the Interview situation. 




Materials ^ The SAT problem was used for all subjects. For subjects 
1-11, the problem was identical to the one cited in Experiment 1. For 
subjects 12 - 25, the only difference in the problem was that the first 
person sampled was said to have an SAT score of 550 instead of 250. 
(Correct answer = 415 for the mean of the sample of 10.) 

Procedure, 'llie general interview procedure was as in Experiment I. 
The subject read the first question which asked ^or the best guess for 
the mean of the sample of ten and answered it, being encouraged to think 
out loud as much as possible. After the subject's answer, the interviower 
asked for the best guess for the mean of the sample of nine. Up to the 
point of the subject answering this second question, the interviewer did 
not in^ervene except to clarify parts of the problem on request, to correct 
the subject if he or she misread the question, or to encourage the subject 
ro think out loud. Tlie subject's answer (assuming the first score was 
250) was classified by the interviewer as: a) demonstrating the correct 
rationale (if the answers to the questions were less than AOO and AOO); 
b) demonstrating representativeness (if both answers were AOO) ; c) demon- 
strating balancing (if the answers were /iOO and greater than AGO). 

The interviewer (Konold) then told the subject that tlie problem had been 
given to many other students ana that he was going to present some answers 
chat other students haJ given. The subject was presented with one of the 
two patterns of answers that he or she had not given and was asked to comment 
on it. The subject was then provided with the remaining pattern, and asked 
to comment on that. For example, if a subject gave 400 as the answer to 
both questions, he or she would be classified as "representative." The 
interviewer would then say that some people had answered that the best 
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guess for the mean cf 10 was less than 400, while the best guess for the 
mean of 9 was 400 (i.e., the coirect solution). The subject was asked if 
he or she could figure out how someone would have arrived at such an answer, 
and then asked what he or she thought of the answer. In the next segment, 
the interviewer would say that some subjects' best guess for the mean of 
the sample of 10 was 400, while for the sample of 9 it was greater than 400 
(the balancing solution). The sane t>eries of questions ensued. At the 
end, the interviewer asked subjects explicitly what the best answer to 
the question was. (The suggestion that they might waat to reconsider their 
original answer is, of course, implicit ia presenting alternative answers.) 
The oil^r of presentation of the two patterns of alternative answers was 
approximately counterbalanced over subjects. Analogously, subjects who 
gave the correct solution ware presented with the representative and 
balancing solutions, and the balancers were given the correct and repre- 
sentative solutions. (One subject who demonstrated the ''trend" strategy 
and one whose original answer was confusing were given all three alternative 
pattern?).) The correct answer was never identified as such. 

The SAT problem was part of an hour-long interview which included 
several other statistics problems. For subjects 1 - 11, the SAT problem 
was the first problem in the interview, and for subjects 12 - 25, it was 
the third or fourth. Tl^e interview on this problem lasted about 10 to 15 

minutes. « 

Results and Discussion 

As described above, the interview consisted of two parts. In the 
first, the interviewer assumed a passive i *e, allowing the subject to 
independently arrive at an answer. In a few cases, subjects gave more 
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than one answer and seemed undecided about which was correct. Accordingly, 
two answers are considered in the subsequent discussion. Tlie first is the 
answer that the subject settled on before the experimenter presented the 
subject with the alternative solutions; the second is the answer that the 
subject settled on at the end of the interview. 

Many subjects hedged their numerical answers with the qualifier '"about" 
or with numerical ranges (see later discussion). Because we were concerned 
that subjectr^ might view a best guess of 415 as "about" 400, the interviewer 
specifically asked these subjects whether the mean would be any more likely 
to be above or below 400. An answer was coded as "400" only if the subject 
thought that there was no tendency in either direction. 
Filial solution before intervention . 

The results closely replicated thoj;e of Experiment 1 (see Table 2). 
The final answer subjects gave before the second phase of the interview is 
given by the right-hand letter in the column marked "Answer No. 1." Thp 
representative solution was again the modal response (56%), while 20% chose 
the correct solution, 12% choije the balancing solution, 4% chose a "trend" 
answer, and 8% of the responses fell into an unclassified category (see 
Table 2). Tliese latter two subjects will not be discussed further. One 
did not appear to understand the question, and the other had several fairly 
incoherent approaches to the problem making it impossible to determine what 
he really bel eved. 



Insert Table 2 about here 
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R eactions to Alternative Solutions 

The most striking aspect of the data Is that the pattern of results 
at the end of the Interview (''Answer No, 2") was not *^ery different from 
that before interviewer intervention (see Table 2), There appeared to 
be a slight movement away from representativeness to balancing. However, 
of t)ie 23 subjects of interest, only 4 changed their answers as a resjlt 
of considering the alternative solutions. We can conclude that the repre- 

» 

sentative answer is not merely a hasty answer to the problem, since when 
confronted with the correct and balancing answers, 12 out of the 14 subjects 
maintained their representative answer, (The other two changed to a 
balancing solution, one i^ubjoct changed from a correct solution to a 
balancing solution and tht^ trend subject changed to a balancing solution,) 
We also examined subjects' reactioni, to the alternative solutions to 
determine how well they understood them. As mentioned earlier, after 
subjects were presented with an alternative solution, they were asked how 
somebody might have arrived at that solution. One the basis of the 
subject's comments, understanding of the rationale for the alternate 
solution was Independently rated by two of the authors on a scale from 
1 (m understanding) to 10 (excellent understanding). The correlation (y) 
between the two sets of ratings was ,75, and there were only seven cases 
in which the ratings differed hy more than three. As can be seen in Table 3, ^ 

Insert Table 3 about here 



a majority of subjects showed reasonable comprehension of alternate 
solutions. Of particular interest is the fact that a majority of subjects 
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who gave the representative <inswer understood the balancing and correct 
solutions • 

Verbal Expression of Heuristics 

Having classified subjects according to the numerical answer they gr.vc, 
we wished to explore the extent to which subjects who gave representative 
and balancing answers inade verbal comments consistent with these heuristics. 
Unfortunately, few subjects m<\de comments that indicated they had consciously 
adopted either heuristic. All of the subjects were asked to explain their 
numerical answers. Twenty-two of the 23 subjects of interest gave at least 
one answer of AOO. Eleven gave .10 clear rationale for their answer of 400. 
Of the remaining eleven, two gave answers that strongly implicated repre- 
sentativeness, e.g., "...this random sample is giving you something about 
the whole comniLnity, so it would still be that [points to 400]*' and seven 
gave just if icationii that su^^^gested a reprei;entat iveness heuristic, e.g., 
"if you made sure you were picking cOtally randomly, it's supposed to come 
up around the mium." The other two subjects gave an "equal ignorance" 
argument, consistent with either representativeness or balancing, i.e., that 
there was no rerison to expect the sample mean to be either higher or lower 
than the population niean . 

To try to find evidence for balancing heuristics, the entire set of 
interviews was searched for any statement suggestive of balancing. Only 
two subjects (one of whom had a representative solution) gave what could 
be construed as balancing rationales, saying either that there were usually 
as many scores above the mean as below or that there should be a higher 
score that would "compensate" for the lower one. Thirteen additional 
subjects did mention that there should be scores in the sample of nine in 
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the opposite direction from the known score, but this statement was mentioned 
in passing or paired with a statement that some scores would also be in the* 
same direction. 

Also of interest was the possibility that subjects may not have 
considered the implications of sampling from a large population and 
consequently may have been concerned about sampling without replacement. 
Only four subjects made comments indicating that they had considered 
implications of the fact that sampling was done without replacement and 
in only one case did it seem to be part of an eventual balancing solution. 
One subject brought up the issue and then said it would not matter as the 
population was large. Two others mentioned sampling without replacement 
only when they were presented with the balancing answer and were asked to 
hypothesize why other students may have given such an answer. 

T) ;ee subjects gave a "trend" answer initially, although two 
spontaneously changed their answers. They seemed to arrive at their estiuute 
for the mean of the nine scores through a quasi-Bayesian rationale in which 
the divergent first score influenced their estimate of the moan of the 
population. A related phenomenon was t\ic curious protestations of four 
subjects thcit the discrepant .score would not change the populatiofi mean: 
e.g., "IJell, if they've dctenniiiod that mean from a large school district, 
then I would certainly put a fair amount of faith in it, and I wouldn't 
vary it on just one drawing. I wouldn't vary it on a sample of ten either." 
Tliese statements all suggested that the population mean was not fixed but 
that the sample evidence was insufficient to alter their estimate of it. 
It is possible that these seven subjects thought that there was a larger, 
unstated, population of which the sc'.^^^ol district was only a (possibly 
non-random) sample. 

1^ 



Consistency 

The verbalizations of the subjects who gave balancing answers thus 
showed little evidence that they had more of a process view of random 
sampling than those who gave representative answers. The two groups 
appeared to differ chiefly in their belief about whether the means of the 
samples should be consistent (i.e., that the mean of the sample of ten 
be equal to the weighted averaj^e of the first score and the mean of the 
last nine scores). Wlien the subjects who gave correct answers and those 
who gave balancing answers were shown the representative answers, nost 
immediately rejected it with comments like "mathematically, it wouldn't 
work out/' or "if they knew anything about math, it [the 550 score] would 
increase the score (the average of lOJ." Ail three subjects who gave a 
balancing answer gave a clear rationale for rejecting the representative 
answer on these grounds. 

The representative answer nujy seem reasonable to many subjects because 
the question asks for the best guess of the means of two hypothetical random 
samj)lcs. Subjects may believe that a lack of consistency is possible for 
hypothetical random sap>ples, since a best guess for the mean is not necessarily 
the mean of any particular set of scores. At the end of the interview, 
those of subjects 1 - II who gave a representative answer were asked whether 
both means could be 400 if one was dealing with observed scores. Only 
one said yes, and it was not clear that she understood that the interviewer 
was as:,lng about actual scores. The others seemed to believe that both 
means could not be 400 with actual observations, but could if you were 
making predictions: "Because I don^t know the actual mean of the sample. 
This Is probability, not fact;*' "It seems like a contradiction, but I 
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still think that the best guess is AOO because it's random.*' While subjects 
12 - 25 were not explicitly asked this question due to an error in procedure, 
many of them dealt with its implications at some point, usually in respondin^^ 
to the" balancing solution: e.g., '*Tl)ey think that the other nine will come 
out to make it a perfect 400, but when you're picking samples, you're not 
goin^ to come out with an exact figure." 

Many subjects showed discomfort in predicting a single value for the 
mean of a sample. Some subjects explicitly tied in variability or "rciiidunuiess" 
with justification of tlie representativeness antiwers, otherr> aliudcd to the 
''random** (i.e., indeterminate) nature of the sample and/or rernarkeJ that 
individual scores or even sample mc.ins "could be anything.** Thirteen of the 
23 subjects preferred either to preface their estimates of the sample meanp 
with hedges such as **about** or **around** or to give interval estimates. 
However, only seven of these gave a representative solution. 

To sumnuirize, most of the representativeness subjects who were expli- 
citly asked about consistency made it clear that they realized that both 
iiu.ins could not be 400 if they were means of actual scores. Other repre- 
sentativeness subjects also commented that beccUJGe of variability or 
randomness in tiie sampling process, it did not havH. to work out neatly as 
in the balancing solution. Many of the subjects also showed discomfort 
with giving a point estimate, indicating that the variability of th<"» 
sampling process was very much on their minds and suggested that a 
best goess for the mean of a hypothetical sample should not be treated 
the same as an actual sample mean. This discomfort may reflect Kahneman 
and Tversky's (1972) second meaning of rjpresentativeness (i.e., that a 
random sample should reflect the sampling process): A sample mean must 
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be "random" and hence have considerable variability and uncertainty 
associated with it. Tlie point is not. of course, that it is a misconception 
to be aware of the variability of sample means. What may distinguish 
experts from novices is that for the expert a best guess and the vari- 
ability of that guess are two separate concepts, whereas the novice has 
difficulty uaking this differentiation. 

Summary and Conclusion 
In the introduction wt^ raised the general question of whether the 
tendency of subjects to ignore the knowa score in giving the best guess 
for a sample mean was due to a descriptive heuristic such as representa- 
tiveness or to a mechanistic one such as active balancing. In both studies, 
the preponderance of subjects who think that the mean of the sample of 10 
is the population mean believe that the mean of the sample of 9 is also 
the population mean — an ars\/er incompatible with active balancing. 

The Interviews indicate that for most subjects the belief that the 
population mean is the best guess for both sample means is deeply held: 
They continue to believe that answer even aft-r being presented with 
alternative solutions, and in spite of the fact that they show reasonably 
good coipprehension of the rationales underlying those solutions. Moreover, 
detailed analysis of subjects* explanations of their answers revealed 
little evidence for balancing imagery. The interviews further suggested 
that subjects consider the representative answer reasonable since they 
regard best guesses for the means of random samples differently than the 
means of known scores. Moreover, many subjects seem uneasy about making 
a best guess for the mean of a random sample. 
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These results have some pedagogical implications. Many textbooks in 
statistics that discuss the Law of Large Numbers attempt to dispel students' 
belief in the gambler's fallacy. However, they assume that the basic r.as- 
conception students have is active balancing, and they oppose this ::.echanism 
with a correct one called "swamping" wherein the large amount of subsequent 
data swamps out the impact of the discrepant score on the mean (e.g., Hays, 
1981). Our own attempts to teach the swamping conceptualization have 
usually proven unsuccessful. Our research suggests that such an approach 
is unfruitful because subjects do not have an incorrect process meclianism; 
indeed, they have virtually no mechanistic way of thinking about random 
samples. To refute active balancing is to refute a belief that students 
actually do not have, and this may confuse thcMii. Since students' actual 
heuristic, representativeness, is so differt*nt in form from the appropriate 
mechanistic belief, it may not be easy to set up an appropriate confrontation 
between the two systems to effect any lasting change in students' beliefs 
about random samples. 
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Table 1 

Frequency of Solution .ypes. Experiment 1 



Solution Type 
Mean of Mean of 

10 scores 9 scores 



Label 



Questionnaires 
SAT Problem 



IQ froblera 



Interviews 
SAT Problem 



Less thaa 400 

400 

400 

400-^ 



400 
400 
400+ 
400- 



Correct Solution 44 (21%) 

Representative 68 (33^) 

Balancing 25 (12%) 

Trend 18 {9'/.\ 

Unclassified 50 (24;^) 



3 (30^) 

6 (60%) 

1 (10%) 

0 (0%) 

0 (0%) 



3 (14%) 

9 (43%) 

5 (24%) 

2 (10%) 

2 (10%) 



Totals 



Combined 



6 (19%) 

'5 (48%) 

6 (19%) 

2 (6%) 

2 (6%) 



205 



10 



21 



31 



'""isMli"::.''"" ^"^""^ ''^^ P"^"'^^^- Cl-^ifi"tion Of responses fo- ^he IQ proble. 

For the trend solution, mean of 10 scores < mean of 9 scores < 400. 
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Table 2 

Frequency of Solution Types, Experiment 2 

a 

Solution Type 

Position in interview Correct Representative Balancing Trend Unclassified 

Final answe*. before alternate 

solutions were presented 5 (20%) 14 (56%) 3 (12%) 1 (4%) 2 (8%) 

(Answer No. 1* 

Answer at end of interview 4 (16%) 12 (48%) 7 (28%) 0 (0%) 2 (8%) 

(Answer No. 2) 

See Table 1 and text for an explanation of these Isbels. 
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Table 3 

fc^...A.rA n.^vlatlons In Parent heses^ 



Answer ill 



No. of Subjects Representative of Balancing 



Cotrect 



Mean 



" 6.75 (3.28) 6.28 (2.70) 6.52 (2.8! 

Representative 1^ ,<--,/,/ 

, 4 50 (2.B6) 6.67 (1.4 

3 8.83 (0.85) ^ 

Balancing -> ,q ^ 

7.60 (1.69) 9.80 (0.24) 



Correct 
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